Echos from the Black Box

Counterfactual Explanations and Probabilistic Methods for Trustworthy Machine Learning

Delft University of Technology

Arie van Deursen
Cynthia C. S. Liem

October 2, 2023

Quick Introduction

  • Recently entered the 3rd year of my PhD in Trustworthy Artificial Intelligence at Delft University of Technology.
  • Previously, educational background in Economics and Finance and two years in Monetary Policy at the Bank of England.
  • Interested in applying Trustworthy AI to real-world problems, particularly in the financial sector.

Background

Counterfactual Explanations

Born out of the need for explanations

Counterfactual Explanation (CE) explain how inputs into a model need to change for it to produce different outputs (Wachter, Mittelstadt, and Russell 2017).

Provided the changes are realistic and actionable, they can be used for Algorithmic Recourse (AR) to help individuals who face adverse outcomes.

Example: Consumer Credit

From ‘loan denied’ to ‘loan supplied’: CounterfactualExplanations.jl 📦.

Figure 1: Gradient-based counterfactual search.

Figure 2: Counterfactuals for Give Me Some Credit dataset (Kaggle 2011).

Example: Insurance Premium1

  • Input \(\mathbf{X}\): A dataset of individuals containing demographic and financial information.
  • Additional Input \(\mathbf{Z}\): Individuals can opt-in to provide their personal Apple Health data to improve their chance of receiving a lower premium.
  • Binary output \(\mathbf{Y}\): based on the data, the individual is either eligible (\(y=1\)) or not eligible (\(y=0\)) for a lower premium.
  • To model \(p(y=1|X)\) the insurance provider can rely on an interpretable linear classifier.
  • To model \(p(y=1|X,Z)\) the insurance provider turns to a more accurate but less interpretable black-box model.

Example: Insurance Premium

In the EU, individuals have the right “[…] to obtain an explanation of the decision reached after such assessment and to challenge the decision.” (Recital 71 of the General Data Protection Regulation (GDPR))

In our example, who do you think is most likely to ask for an explanation?

You were promised some maths … 🌶️🌶️🌶️

But wait a second …

Equation 1 looks a lot like an adversarial attack (Goodfellow, Shlens, and Szegedy 2014), doesn’t it?

Figure 3: Adversarial attack on an Image Classifier.

In both settings, we take gradients with respect to features \(\nabla_{\mathbf{Z}^\prime}\text{yloss}(M_{\theta}(f(\mathbf{Z}^\prime)),\mathbf{y}^+)\) in order to trigger changes in the model’s output.

Gradient Descend Visualized

Figure 4: Gradient-based counterfactual search.

Research Questions

Recourse Dynamics

We present evidence suggesting that state-of-the-art applications of Algorithmic Recourse to groups of individuals induce large domain and model shifts and propose ways to mitigate this (IEEE SaTML paper).

Joint work with Giovan Angela, Karol Dobiczek, Aleksander Buszydlik, Arie van Deursen and Cynthia C. S. Liem (all TU Delft).

A Balancing Act

  • Minimizing private costs generates external costs for other stakeholders.
  • To avoid this, counterfactuals need to be plausible, i.e. comply with the data-generating process.
  • In practice, costs to various stakeholders need to be carefully balanced.

Is plausibility really all we need?

Pick your Poison?

All of these counterfactuals are valid explanations for the model’s prediction. Which one would you pick?

Figure 5: Turning a 9 into a 7: Counterfactual Examplanations for an Image Classifier.

What do Models Learn?

These images are sampled from the posterior distribution learned by the model. Looks different, no?

Figure 6: Conditional Generated Images from the Image Classifier

ECCCos from the Black Box

We propose a framework for generating Energy-Constrained Conformal Counterfactuals (ECCCos) which explain black-box models faithfully.

Joint work with Mojtaba Framanbar (ING), Arie van Deursen (TU Delft) and Cynthia C. S. Liem (TU Delft).

Figure 7: Gradient fields and counterfactual paths for different generators.

How can we make models more trustworthy?

Conformal Prediction

Conformal Prediction is a model-agnostic, distribution-free approach to Predictive Uncertainty Quantification: ConformalPrediction.jl 📦.

Figure 8: Conformal Prediction intervals for regression.

Figure 9: Conformal Prediction sets for an Image Classifier.

Joint Energy Models

Joint Energy Models (JEMs) are hybrid models trained to learn the conditional output and input distribution (Grathwohl et al. 2020): JointEnergyModels.jl 📦.

Figure 10: A JEM trained on Circles data.

Trustworthy AI in Julia

🐶 Taija

Research informs development, development informs research.

Trustworthy Artificial Intelligence in Julia.

Taija is a collection of open-source packages for Trustworthy AI in Julia. Our goal is to help researchers and practitioners assess the trustworthiness of predictive models.

Our work has been presented at JuliaCon 2022 and will be presented again at JuliaCon 2023 and hopefully beyond.

Questions?

LinkedIn (Personal) Twitter Github Medium

References

Goodfellow, Ian J, Jonathon Shlens, and Christian Szegedy. 2014. “Explaining and Harnessing Adversarial Examples.” https://arxiv.org/abs/1412.6572.
Grathwohl, Will, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, and Kevin Swersky. 2020. “Your Classifier Is Secretly an Energy Based Model and You Should Treat It Like One.” In. https://openreview.net/forum?id=Hkxzx0NtDB.
Kaggle. 2011. “Give Me Some Credit, Improve on the State of the Art in Credit Scoring by Predicting the Probability That Somebody Will Experience Financial Distress in the Next Two Years.” Kaggle. https://www.kaggle.com/c/GiveMeSomeCredit.
Spooner, Thomas, Danial Dervovic, Jason Long, Jon Shepard, Jiahao Chen, and Daniele Magazzeni. 2021. “Counterfactual Explanations for Arbitrary Regression Models.” https://arxiv.org/abs/2106.15212.
Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harv. JL & Tech. 31: 841.